KIVA - AN INTRODUCTION

Kiva Microfunds (commonly known by its domain name, Kiva.org) is a 501(c)(3) non-profit organization that allows people to lend money via the Internet to low-income entrepreneurs and students in over 80 countries. Kiva’s mission is “to connect people through lending to alleviate poverty.Kiva operates two models-Kiva.org and KivaZip.org. Kiva.org relies on a network of field partners to administer the loans on the ground.Kiva is headquartered in San Francisco, California.



OBJECTIVE

Estimation of Welfare Level of Borrowers Categorized on the Basis of Areas Classified By Shared Economic And Demographic Characterstics



OBJECTIVE OF NOTEBOOK

An analysis of the data provided by KIVA encompassing detailed discussion of ideas for aditional data sources, by supportive usage of graphs and plots to make the results more interactive.



A Note from a Beginner

As a new member in the community of Kaggle this is my first Kernel. While best efforts have been put to avoid any form of misinterpretation of data,inadvertent errors might have crept in. Constructive criticisms and honest reviews are widely welcomed.



Libraries

library(readr)

packages <- c("data.table", "ggplot2", "dplyr", "mosaic", "magrittr", 
              "grid", "cowplot", "gridExtra", "corrplot",
             "RColorBrewer", "gmodels")
lapply(packages, require, character.only = T)
## Loading required package: data.table
## Loading required package: ggplot2
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
## 
##     between, first, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Loading required package: mosaic
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'mosaic'
## Loading required package: magrittr
## Loading required package: grid
## Loading required package: cowplot
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'cowplot'
## Loading required package: gridExtra
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
## Loading required package: corrplot
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'corrplot'
## Loading required package: RColorBrewer
## Loading required package: gmodels
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'gmodels'
## [[1]]
## [1] TRUE
## 
## [[2]]
## [1] TRUE
## 
## [[3]]
## [1] TRUE
## 
## [[4]]
## [1] FALSE
## 
## [[5]]
## [1] TRUE
## 
## [[6]]
## [1] TRUE
## 
## [[7]]
## [1] FALSE
## 
## [[8]]
## [1] TRUE
## 
## [[9]]
## [1] FALSE
## 
## [[10]]
## [1] TRUE
## 
## [[11]]
## [1] FALSE
library(wordcloud) #  wordcloud
library(DT)       # table format display of data
library(leaflet) # maps

library(igraph) #  graphs
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:dplyr':
## 
##     as_data_frame, groups, union
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:igraph':
## 
##     groups
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

Loading the dataset

loans <- read_csv("kiva_loans.csv")
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   id = col_integer(),
##   funded_amount = col_double(),
##   loan_amount = col_double(),
##   partner_id = col_double(),
##   posted_time = col_datetime(format = ""),
##   disbursed_time = col_datetime(format = ""),
##   funded_time = col_datetime(format = ""),
##   term_in_months = col_double(),
##   lender_count = col_integer(),
##   date = col_date(format = "")
## )
## See spec(...) for full column specifications.
location <- read_csv("kiva_mpi_region_locations.csv")
## Parsed with column specification:
## cols(
##   LocationName = col_character(),
##   ISO = col_character(),
##   country = col_character(),
##   region = col_character(),
##   world_region = col_character(),
##   MPI = col_double(),
##   geo = col_character(),
##   lat = col_double(),
##   lon = col_double()
## )
loan_theme <-  read_csv("loan_theme_ids.csv")
## Parsed with column specification:
## cols(
##   id = col_integer(),
##   `Loan Theme ID` = col_character(),
##   `Loan Theme Type` = col_character(),
##   `Partner ID` = col_double()
## )
loan_themes_region <- read_csv("loan_themes_by_region.csv")
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   `Partner ID` = col_integer(),
##   number = col_integer(),
##   amount = col_integer(),
##   lat = col_double(),
##   lon = col_double()
## )
## See spec(...) for full column specifications.

Null Value Identifications

To avoid discrepancies in the analysis of data,identification of null values in the dataset is necessary.The table below shows the attributes with the percentage of null values it contains.


Null Value identifications in loans

x<- colMeans(is.na(loans))
distribution<- x[x>0]
t<-data.frame(distribution)

colnames(t) <- "Percentage"
t$Percentage <- t$Percentage *100

    d = data.frame(
      t,
      stringsAsFactors = TRUE
    )
    dt <- datatable(d, filter = 'bottom', options = list(pageLength = 8))
    dt

Null Value identification in loan_theme

x<- colMeans(is.na(loan_theme))
distribution<- x[x>0]
t<-data.frame(distribution)

colnames(t) <- "Percentage"
t$Percentage <- t$Percentage *100

require('DT')
    d = data.frame(
      t,
      stringsAsFactors = TRUE
    )
    dt <- datatable(d, filter = 'bottom', options = list(pageLength = 8))
    dt

Null Values in Location

x<- colMeans(is.na(location))
distribution<- x[x>0]
t<-data.frame(distribution)

colnames(t) <- "Percentage"
t$Percentage <- t$Percentage *100

require('DT')
    d = data.frame(
      t,
      stringsAsFactors = TRUE
    )
    dt <- datatable(d, filter = 'bottom', options = list(pageLength = 8))
    dt

The Grant of Loans - Country Specific

loans %>%
  group_by(country) %>%
  summarise(Count = n()) %>%
  arrange(desc(Count)) %>%
  ungroup() %>% 
  mutate(country = reorder(country,Count)) %>%
  head(10) %>%
    
 ggplot(aes(x = country,y = Count)) +
  geom_bar(stat='identity',colour="white",fill = " dark green") +
  geom_text(aes(x = country, y = 1, label = paste0("(",Count,")",sep="")),
            hjust=0, vjust=.5, size = 4, colour = 'black',
            fontface = 'bold') +
  labs(x = 'country', 
       y = 'Count' 
       ) +
 
   theme_light()

As a result of the analysis it is evident that Phillipines is the country which ranks highest among the brrower of loans.



Sector Wise Analysis on the Basis of Amount of Loans

From the given dataset the sectors of Agriculture and Food hold the top positions in terms of borrowing of loans.



Analysis of Loans - Female Borrowers Based on Regions.

 loans$borr <- if_else((loans$borrower_genders == "female"), "female","male")
 k<- data.frame(table(loans$country[loans$borr == "female"]))


p <- plot_ly(k, labels = ~Var1, values = ~Freq, type = 'pie') %>%
  layout(
         xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
         yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
p

Phillipines and Kenya are among the leading countries on basis of female borrowers of loans.



The regularities and irregularities of payback

Irregular Payment Modes

 k<- data.frame(table(loans$country[loans$repayment_interval == "irregular"]))


p <- plot_ly(k, labels = ~Var1, values = ~Freq, type = 'pie') %>%
  layout(
         xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
         yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
p

Phillipines scores high when the aspect of irregular payment is in question



Monthly Repayment Modes

 k<- data.frame(table(loans$country[loans$repayment_interval == "monthly"]))


p <- plot_ly(k, labels = ~Var1, values = ~Freq, type = 'pie') %>%
  layout(
         xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
         yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
p

Cambodia and Kenya among the top countries who pays ack in monthly repayment mode.



Bullet Repayment Mode

In banking and finance, a bullet loan is a loan where a payment of the entire principal of the loan, and sometimes the principal and interest, is due at the end of the loan term. Likewise for bullet bond. A bullet loan can be a mortgage, bond, note or any other type of credit.It is also sometimes known as EMI Free Loan.

 k<- data.frame(table(loans$country[loans$repayment_interval == "bullet"]))


p <- plot_ly(k, labels = ~Var1, values = ~Freq, type = 'pie') %>%
  layout(
         xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
         yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
p

Colombia and Nigeria which doesnot feature anywhere else ends up as the top players in this list.

Sector wise classification of loans based on top grossers

top10 <- droplevels(tail(as.factor(loans$country),10))
top10_loans <- loans[loans$country %in% top10,]
top10_loans$country <- as.factor(top10_loans$country)
top10_loans$sector <- as.factor(top10_loans$sector)

ggplot(top10_loans, aes(country)) +
  geom_bar(aes(fill=sector), width = 0.8, col='black') + theme(axis.text.x = element_text(angle=65, vjust=0.6))

Cartographic Representation of the Top Grossers of Loans

leaflet(loan_themes_region) %>% addProviderTiles("Esri.NatGeoWorldMap") %>%
  addCircles(lng = ~lon, lat = ~lat,radius = ~(amount/10) ,
             color = ~c("red"))  %>%
  # controls
  setView(lng=0, lat=0,zoom = 2) 
## Warning in validateCoords(lng, lat, funcName): Data contains 2074 rows with
## either missing or invalid lat/lon values and will be ignored

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.